Learning Semantic Sub-graphs for Document Summarization

نویسندگان

Jure Leskovec

Marko Grobelnik

Natasa Milic-Frayling

چکیده

In this paper we present a method for summarizing document by creating a semantic graph of the original document and identifying the substructure of such a graph that can be used to extract sentences for a document summary. We start with deep syntactic analysis of the text and, for each sentence, extract logical form triples, subject–predicate–object. We then apply cross-sentence pronoun resolution, co-reference resolution, and semantic normalization to refine the set of triples and merge them into a semantic graph. This procedure is applied to both documents and corresponding summary extracts. We train linear Support Vector Machine on the logical form triples to learn how to extract triples that belong to sentences in document summaries. The classifier is then used for automatic creation of document summaries of test data. Our experiments with the DUC 2002 data show that increasing the set of attributes to include semantic properties and topological graph properties of logical triples yields statistically significant improvement of the micro-average F1 measure for the extracted summaries. We also observe that attributes describing various aspects of semantic graph are weighted highly by SVM in the learned model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Impact of Linguistic Analysis on the Semantic Graph Coverage and Learning of Document Extracts

Automatic document summarization is a problem of creating a document surrogate that adequately represents the full document content. We aim at a summarization system that can replicate the quality of summaries created by humans. In this paper we investigate the machine learning method for extracting full sentences from documents based on the document semantic graph structure. In particular, we ...

متن کامل

Reconciling Event-Based Knowledge Through RDF2VEC

The reconciled knowledge graphs are typically used for multidocument summarization, or to detect knowledge evolution across document series. This paper focuses on reconciling knowledge graphs generated from two text documents about similar events described differently. Our approach employs and extends MERGILO, a tool for reconciling knowledge graphs extracted from text, using word similarity an...

متن کامل

Graph-Based Multi-Modality Learning for Topic-Focused Multi-Document Summarization

Graph-based manifold-ranking methods have been successfully applied to topic-focused multi-document summarization. This paper further proposes to use the multi-modality manifold-ranking algorithm for extracting topic-focused summary from multiple documents by considering the within-document sentence relationships and the cross-document sentence relationships as two separate modalities (graphs)....

متن کامل

Two-tier Architecture for Domain Specific Document Summarization Using Probabilistic Latent Semantic Analysis

In this research work we have proposed two-tier architecture for document summarization. This architecture minimizes the redundancy and boosts the information relevancy in the summary by applying Probabilistic Latent Semantic Analysis (PLSA) at two levels. It also enhances the summarizer’s speed by using Incremental Expectation Maximization algorithm for PLSA learning rather than Expectation Ma...

متن کامل

Semantic Graphs Derived From Triplets with Application in Document Summarization

Information nowadays has become more and more accessible, so much as to give birth to an information overload issue. Yet important decisions have to be made, depending on the available information. As it is impossible to read all the relevant content that helps one stay informed, a possible solution would be condensing data and obtaining the kernel of a text by automatically summarizing it. We ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2004

Learning Semantic Sub-graphs for Document Summarization

نویسندگان

چکیده

منابع مشابه

Impact of Linguistic Analysis on the Semantic Graph Coverage and Learning of Document Extracts

Reconciling Event-Based Knowledge Through RDF2VEC

Graph-Based Multi-Modality Learning for Topic-Focused Multi-Document Summarization

Two-tier Architecture for Domain Specific Document Summarization Using Probabilistic Latent Semantic Analysis

Semantic Graphs Derived From Triplets with Application in Document Summarization

عنوان ژورنال:

اشتراک گذاری